home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Users Group Library 1996 July
/
C-C++ Users Group Library July 1996.iso
/
vol_100
/
185_01
/
ssort.doc
< prev
next >
Wrap
Text File
|
1985-08-19
|
6KB
|
151 lines
SSORT.DOC
SSORT is a merge sort utility. That is, the limitations on
the size of the file to be sorted is one of disk space rather
than memory space. As configured, it allows up to 20 (recompile
for more) sort keys to be specified. SSORT has one built-in sort
precedence order and has a command line option for overlaying an
arbitrary number of others (one at a time) from a file called
"SSORT.OVL". This is useful, for instance, to select a
descending sort.
This file (SSORT.DOC) contains information to patch the SSORT
system to install your favorite collating sequences. The ones
supplied are: Lexicographical, Reverse Lexicographical, ASCII,
Reverse ASCII.
As configured, the built-in collating sequence in SSORT
treats all punctuation and non-printing characters as
non-existent (not even consuming a column). This means that
A!@#$%^&*()B will sort adjacent to AB, for example. It also
treats 'A' and 'a' as distinct but adjacent. That is, a
lexicographcal sort. This is not REALLY lexicographical on a
string basis. To get that, 'A' and 'a' should have exactly the
same sort percedence, but that would mean that several occurrences
of the same string whose only difference was case would sort to a
scrambled order within the equivalent strings, for instance.
AAAAA
aaaaa
aaaaa
AAAAA
AAAAA
aaaaa
The drawback to the implemented version is that:
SSORT.C
SSORT.CSM
ssort.c
is the results for the example data, since 's' is above 'S'
in the collating sequence. What is really wanted is "case blindness"
on a string (not a character) basis, but I haven't figured out how to
do that yet. Nor have I missed it much.
The rules for the precedence orders for characters are contained
in a table in the lexlate() function which is in LEXLATE.CSM.
The file called SSORT.OVL may contain an arbitrary number
of collating sequences (actually only as many as 16384), any one
of which may be overlayed at execution time by using the "-c"
option (see below). As configured, -c0 is reverse lecicographical,
-c1 is ASCII, -c2 is reverse ASCII.
Usage: ssort <infile> <outfile> [-c<entry number>] [-k<sort key list>]
where -c<entry number> indicates:
Use the <entry number>'th collating sequence in SSORT.OVL.
where <sort key list> is:
A comma separated list of column numbers or ranges
specifing the sort key positions.
e.g.
ssort messy.dat neat.dat -c3 -k3-5,7-9,1-2,12
specifies that:
1) The input file is MESSY.DAT.
2) The output file is NEAT.DAT.
3) The collating sequence to be used is number 3 in SSORT.OVL
note that the first sequence in SSORT.OVL is number 0.
4) The primary sort key is columns 3 thru 5.
5) The first secondary sort key is columns 7 thru 9.
6) The next secondary sort key is columns 1 thru 2.
7) The last secondary sort key is columns 12 thru end of line.
A sort key of 1 column may be specified as 3-3 for example.
A sort key which goes to end of line need NOT be the last one.
The leftmost column is numbered 1.
The default sort key is the entire line.
Files in SSORT.LBR
SSORT.DQC - This file in squeezed format
'SSORT.SH - MicroShell batch file to build SSORT.COM (similar to SUBMIT)
If you have MicroShell, I probably don't have to tell you not
to use it, since ASM gets confused when run from a shell file.
Just treat it as build instructions.
SSORT.CQ - BDS C source for lexsort in a squeezed format
LEXLATE.CQM - BDS C '.CSM' (assembly language) file for the sort
precedence routine.
SSORT.OBJ - SSORT.COM renamed so you can't execute on a RBBS
SSORT.SYM - The symbol table from the L2 linker for LEXSORT.COM
SSORT.OVL - A file of collating sequences (mentioned above)
SORTORDR.AQM - ASM assembly language file (in squeezed format) to generate
a customized SSORT.OVL
instructions for use are contained within it.
PATCH DATA:
1) For those users would a different "built-in"
precedence order but do not have BDS C, the
combination of LEXLATE.CSM and LEXSORT.SYM show where
the table ends up in memory and hence in LEXSORT.COM.
This is the address of LEXLATE + 0EH. As configured,
the magic address is (251b + 000e = 2529).
The rules are that this is a 256 byte table
corresponding to the 256 ASCII codes. A code of 255
is used to indicate that the character should be
ignored entirely. Any other value is the sort
precedence for the corresponding ASCII character.
Make sure that 0 translates to itself so that C will
be able to recognize end of string.
2) If you wish to change the name SSORT.OVL, it is located
at (24ff + 8 = 2507). This is the address of the function
collate_name() + 8, see entry COLLATE_ in file SSORT.SYM (in
case someone has re-compiled SSORT since I worte this documentation
file. This address may be patched to contain
a specific drive and/or user reference according to
BDS C rules i.e. uu/d:nnnnnnnn.ttt. The file name must
be no longer than 19 characters and must be followed by
a zero byte to terminate the string.
3) If you wish to delete/re-arrange/add-to the collating
sequences in SSORT.OVL, see the instructions in SORTORDR.AQM
HISTORY:
SSORT.C is a modification to LEXSORT.C. LEXSORT.C is a
modification of SORT3.C. SORT3.C is Leor Zolman's translation of
Ratfor code, published in "Software Tools", into BDS C.
Both LEXSORT and SSORT modifications were done by Harvey Moran.
If you have any comments about SSORT (good or bad), I can be
reached through the BHEC RBBS (Baltimore Heath Electronic Center
Remote Bulletin Board Service). The phone number is (301)
661-2175. This system operates at 300 and 1200 baud, 24 hrs a
day, but 6:00-8:00 am Eastern Time is reserved for system
maintenance. I am one of the SYSOP's on this system.
Harvey Moran
2/26/84
be patched to contain
a specific drive and/or user referen